The tasks included in this report are:
unique(measure_labels$task_group)
[1] "adaptive_n_back" "attention_network_task"
[3] "choice_reaction_time" "directed_forgetting"
[5] "dot_pattern_expectancy" "local_global_letter"
[7] "motor_selective_stop_signal" "recent_probes"
[9] "shape_matching" "simon"
[11] "stim_selective_stop_signal" "stop_signal"
[13] "stroop" "threebytwo"
adaptive_n_back
attention_network_task
choice_reaction_time
directed_forgetting
dot_pattern_expectancy
local_global_letter
motor_selective_stop_signal
recent_probes
shape_matching
simon
stim_selective_stop_signal
stop_signal
stroop
threebytwo
Our analyses span 512 different measures.
NOTE: We found that HDDM parameters for T1 data do not differ in any detectable way depending on whether they are fit using the full T1 sample (n=522) or only for the subset who completed the T2 battery as well (n=150). Therefore the T1 HDDM parameters in this report are based on fits from n=150. Details of these analyses can be found here.
Plot reliability point estimates comparing DDM measures to raw measures faceting for contrast measures.
fig_name = 'ddmvsraw_point.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
Plot averaged bootstrapped reliability estimates per measure comparing DDM measures to raw measures faceting for contrast measures.
fig_name = 'ddmvsraw_boot.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
Model testing if the reliability of raw measures differs from that of ddm estimates and if contrast measures differ from non-contrast measures.
Checking if both fixed effects of raw vs ddm and contrast vs non-contrast as well as their interaction is necessary.
Conclusion: Interactive model is best.
mer1 = lmer(icc ~ ddm_raw + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer1a = lmer(icc ~ overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer2 = lmer(icc ~ ddm_raw + overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer3 = lmer(icc ~ ddm_raw * overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
anova(mer1, mer2, mer3)
refitting model(s) with ML (instead of REML)
anova(mer1a, mer2, mer3)
refitting model(s) with ML (instead of REML)
rm(mer1, mer2, mer1a)
Raw measures do not significantly differ from ddm parameters in their reliability but non-contrast measures are significantly more reliable compared to contrast and condition measures.
summary(mer3)
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ ddm_raw * overall_difference + (1 | dv)
Data: boot_df %>% filter(rt_acc != "other")
REML criterion at convergence: -654256
Scaled residuals:
Min 1Q Median 3Q Max
-130.34 -0.36 0.03 0.40 17.35
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.0518 0.228
Residual 0.0162 0.127
Number of obs: 512000, groups: dv, 512
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.5947 0.0245 24.23
ddm_rawraw -0.0329 0.0395 -0.83
overall_differencecontrast -0.3660 0.0366 -9.99
overall_differencecondition -0.0865 0.0304 -2.84
ddm_rawraw:overall_differencecontrast 0.0790 0.0597 1.32
ddm_rawraw:overall_differencecondition -0.0743 0.0490 -1.52
Correlation of Fixed Effects:
(Intr) ddm_rw ovrll_dffrnccnt ovrll_dffrnccnd
ddm_rawraw -0.621
ovrll_dffrnccnt -0.670 0.416
ovrll_dffrnccnd -0.806 0.501 0.540
ddm_rwrw:vrll_dffrnccnt 0.411 -0.662 -0.614 -0.331
ddm_rwrw:vrll_dffrnccnd 0.501 -0.807 -0.336 -0.621
ddm_rwrw:vrll_dffrnccnt
ddm_rawraw
ovrll_dffrnccnt
ovrll_dffrnccnd
ddm_rwrw:vrll_dffrnccnt
ddm_rwrw:vrll_dffrnccnd 0.534
What is the best measure of individual difference for any measure that has both raw and DDM parameters?
Even though overall the ddm parameters are not significantly less reliable the most reliable measure is more frequently a raw measure. There are some examples of an EZ estimate being the best for a task as well. Regardless of raw vs ddm the best measure is always a non-contrast measure.
rel_df %>%
group_by(task_group) %>%
filter(icc == max(icc)) %>%
select(task_group, everything())
fig_name = 'ddmvsraw_varsubs.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
fig_name = 'ddmvsraw_varresid.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
Model testing if the percentage of between subjects variance of raw measures differs from that of ddm estimates and if contrast measures differ from non-contrast measures.
Checking if both fixed effects of raw vs ddm and contrast vs non-contrast as well as their interaction is necessary.
Conclusion: Model with fixed effects for both is best.
mer1 = lmer(var_subs_pct ~ ddm_raw + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer1a = lmer(var_subs_pct ~ overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer2 = lmer(var_subs_pct ~ ddm_raw + overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer3 = lmer(var_subs_pct ~ ddm_raw * overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
anova(mer1, mer2, mer3)
refitting model(s) with ML (instead of REML)
anova(mer1a, mer2, mer3)
refitting model(s) with ML (instead of REML)
rm(mer1, mer1a, mer3)
Contrast measures have lower between subjects variability (ie are worse individual difference measures). Raw and ddm measures do not differ significantly.
summary(mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: var_subs_pct ~ ddm_raw + overall_difference + (1 | dv)
Data: boot_df %>% filter(rt_acc != "other")
REML criterion at convergence: 3845550
Scaled residuals:
Min 1Q Median 3Q Max
-4.485 -0.657 0.065 0.669 4.841
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 197 14.0
Residual 106 10.3
Number of obs: 512000, groups: dv, 512
Fixed effects:
Estimate Std. Error t value
(Intercept) 50.59 1.29 39.4
ddm_rawraw 2.30 1.28 1.8
overall_differencecontrast -8.02 1.78 -4.5
overall_differencecondition -2.57 1.47 -1.7
Correlation of Fixed Effects:
(Intr) ddm_rw ovrll_dffrnccnt
ddm_rawraw -0.383
ovrll_dffrnccnt -0.619 0.012
ovrll_dffrnccnd -0.745 -0.001 0.536
Model testing if the percentage of residual variance of raw measures differs from that of ddm estimates and if contrast measures differ from non-contrast measures.
Checking if both fixed effects of raw vs ddm and contrast vs non-contrast as well as their interaction is necessary.
Conclusion: Interactive model is best
mer1 = lmer(var_resid_pct ~ ddm_raw + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer1a = lmer(var_resid_pct ~ overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer2 = lmer(var_resid_pct ~ ddm_raw + overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
mer3 = lmer(var_resid_pct ~ ddm_raw * overall_difference + (1|dv), boot_df %>% filter(rt_acc != "other"))
anova(mer1, mer2, mer3)
refitting model(s) with ML (instead of REML)
anova(mer1a, mer2, mer3)
refitting model(s) with ML (instead of REML)
rm(mer1, mer1a, mer2)
Both contrast and condition measures have higher residual variance. Raw and ddm measures do not differ.
summary(mer3)
Linear mixed model fit by REML ['lmerMod']
Formula: var_resid_pct ~ ddm_raw * overall_difference + (1 | dv)
Data: boot_df %>% filter(rt_acc != "other")
REML criterion at convergence: 3308034
Scaled residuals:
Min 1Q Median 3Q Max
-6.179 -0.562 0.016 0.604 7.682
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 64.7 8.05
Residual 37.2 6.10
Number of obs: 512000, groups: dv, 512
Fixed effects:
Estimate Std. Error t value
(Intercept) 19.460 0.868 22.42
ddm_rawraw 1.279 1.397 0.92
overall_differencecontrast 12.009 1.295 9.27
overall_differencecondition 2.783 1.076 2.59
ddm_rawraw:overall_differencecontrast -1.387 2.111 -0.66
ddm_rawraw:overall_differencecondition 2.968 1.732 1.71
Correlation of Fixed Effects:
(Intr) ddm_rw ovrll_dffrnccnt ovrll_dffrnccnd
ddm_rawraw -0.621
ovrll_dffrnccnt -0.670 0.416
ovrll_dffrnccnd -0.806 0.501 0.540
ddm_rwrw:vrll_dffrnccnt 0.411 -0.662 -0.614 -0.331
ddm_rwrw:vrll_dffrnccnd 0.501 -0.807 -0.336 -0.621
ddm_rwrw:vrll_dffrnccnt
ddm_rawraw
ovrll_dffrnccnt
ovrll_dffrnccnd
ddm_rwrw:vrll_dffrnccnt
ddm_rwrw:vrll_dffrnccnd 0.534
rm(mer3)
Differences in HDDM parameter reliability for t1 data using either n=552 or n=150 were in a separate report on T1 HDDM parameters. No meaningful differences were found between these two sample sizes.
But even 150 is a large sample size for psychological studies, especially forced choice reaction time tasks that are included in this report. Here we look at how the reliability for raw and ddm measures change for sample sizes that are more common in studies using these tasks (25, 50, 75, 100, 125, 150)
Note: Not refitting HDDM’s for each of these sample sizes since a. there were no differences in parameter stability for n=150 vs 552 and b. a more comprehensive comparison using non-hierarchical estimates and model fit indices will follow. [Should I revisit this? - 150 and 552 might be too large to lead to changes in parameter estimates but smaller samples that are more common in psych studies might sway estimates more. If this were the case then wouldn’t we expect the comparison of non-hierarchical vs hierarchical estimates to be the largest? If there is no difference then we don’t have to worry about it?]
Note: Some variables do not have enough variance to calculate reliability for difference sample sizes. These variables are:
>stroop.post_error_slowing
>simon.std_rt_error
>shape_matching.post_error_slowing
>directed_forgetting.post_error_slowing
>choice_reaction_time.post_error_slowing
>choice_reaction_time.std_rt_error
>dot_pattern_expectancy.post_error_slowing
>motor_selective_stop_signal.go_rt_std_error
>motor_selective_stop_signal.go_rt_error
>attention_network_task.post_error_slowing
>recent_probes.post_error_slowing
>simon.post_error_slowing
>dot_pattern_expectancy.BY_errors
source('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/code/workspace_scripts/ddm_reldf_sample_size.R')
Warning: Column `dv` joining factor and character vector, coercing into
character vector
Does the mean reliability change with sample size?
Yes. The larger the sample size the more reliable is a given measure on average. The largest increase in reliability is when shifting from 25 to 50 subjects. This is important because many studies using these measures have sample sizes <50 per group.
fig_name = 'rel_by_samplesize.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
When <15 subjects are used to calculate the measures they are significantly less reliable.
summary(lmer(icc ~ factor(sample_size) + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ factor(sample_size) + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2823478
Scaled residuals:
Min 1Q Median 3Q Max
-605.6 0.0 0.0 0.0 0.4
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.06528 0.2555
iteration (Intercept) 0.00427 0.0653
Residual 65.64367 8.1021
Number of obs: 402034, groups: dv, 505; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.1251 0.0386 3.24
factor(sample_size)15 0.2470 0.0513 4.81
factor(sample_size)20 0.2780 0.0513 5.42
factor(sample_size)25 0.2947 0.0512 5.76
factor(sample_size)50 0.3222 0.0512 6.30
factor(sample_size)75 0.3303 0.0512 6.45
factor(sample_size)100 0.3339 0.0512 6.53
factor(sample_size)125 0.3353 0.0512 6.55
Correlation of Fixed Effects:
(Intr) f(_)15 f(_)20 f(_)25 f(_)50 f(_)75 f(_)10
fctr(sm_)15 -0.665
fctr(sm_)20 -0.665 0.500
fctr(sm_)25 -0.667 0.502 0.502
fctr(sm_)50 -0.667 0.502 0.502 0.504
fctr(sm_)75 -0.667 0.502 0.502 0.504 0.504
fctr(s_)100 -0.667 0.502 0.502 0.504 0.504 0.504
fctr(s_)125 -0.667 0.502 0.502 0.504 0.504 0.504 0.504
Are there differences between any other sample sizes? This ignores the differences between variables but there seems to be only differences between n=10 and all other larger sample size.
with(rel_df_sample_size_summary, pairwise.t.test(mean_icc, sample_size, p.adjust.method = "bonferroni"))
Pairwise comparisons using t tests with pooled SD
data: mean_icc and sample_size
10 15 20 25 50 75 100
15 2e-04 - - - - - -
20 9e-06 1 - - - - -
25 2e-06 1 1 - - - -
50 9e-08 1 1 1 - - -
75 4e-08 1 1 1 1 - -
100 3e-08 1 1 1 1 1 -
125 2e-08 1 1 1 1 1 1
P value adjustment method: bonferroni
Does the change in reliabiliity with sample size vary by variable type?
No. The changes do not differ by raw vs. ddm measures or for contrast and condition measures compared to non-contrast measures. Contrast and condition measures are just less reliable overall.
summary(lmer(icc ~ sample_size * ddm_raw + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ sample_size * ddm_raw + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2805433
Scaled residuals:
Min 1Q Median 3Q Max
-603.4 0.0 0.0 0.0 0.4
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.06500 0.2550
iteration (Intercept) 0.00433 0.0658
Residual 66.14286 8.1328
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.349578 0.030865 11.33
sample_size 0.001260 0.000400 3.15
ddm_rawraw -0.105882 0.049825 -2.13
sample_size:ddm_rawraw 0.000831 0.000662 1.26
Correlation of Fixed Effects:
(Intr) smpl_s ddm_rw
sample_size -0.681
ddm_rawraw -0.591 0.422
smpl_sz:dd_ 0.412 -0.605 -0.697
summary(lmer(icc ~ sample_size * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula:
icc ~ sample_size * overall_difference + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2805380
Scaled residuals:
Min 1Q Median 3Q Max
-603.4 0.0 0.0 0.0 0.3
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.04637 0.2153
iteration (Intercept) 0.00433 0.0658
Residual 66.14275 8.1328
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.518096 0.044356 11.68
sample_size 0.000621 0.000602 1.03
overall_differencecontrast -0.459839 0.065919 -6.98
overall_differencecondition -0.211001 0.054844 -3.85
sample_size:overall_differencecontrast 0.001230 0.000905 1.36
sample_size:overall_differencecondition 0.001343 0.000753 1.78
Correlation of Fixed Effects:
(Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size -0.713
ovrll_dffrnccnt -0.658 0.480
ovrll_dffrnccnd -0.791 0.577 0.532
smpl_sz:vrll_dffrnccnt 0.475 -0.665 -0.721 -0.384
smpl_sz:vrll_dffrnccnd 0.571 -0.800 -0.384 -0.721
smpl_sz:vrll_dffrnccnt
sample_size
ovrll_dffrnccnt
ovrll_dffrnccnd
smpl_sz:vrll_dffrnccnt
smpl_sz:vrll_dffrnccnd 0.532
Does variability of reliability change with sample size?
Trending but not significant. The SEMs are always pretty small.
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), sem_icc))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
facet_wrap(~overall_difference)+
ylab("Standard error of mean of reliability \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")+
ylim(0,0.3)
Warning: Removed 16 rows containing missing values (geom_path).
summary(lmer(sem_icc ~ sample_size * overall_difference + (1|dv), rel_df_sample_size_summary))
Linear mixed model fit by REML ['lmerMod']
Formula: sem_icc ~ sample_size * overall_difference + (1 | dv)
Data: rel_df_sample_size_summary
REML criterion at convergence: 9708
Scaled residuals:
Min 1Q Median 3Q Max
-0.12 -0.06 -0.02 0.01 60.38
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.000213 0.0146
Residual 0.657690 0.8110
Number of obs: 3992, groups: dv, 499
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.038630 0.039761 0.97
sample_size -0.000334 0.000600 -0.56
overall_differencecontrast 0.048118 0.059791 0.80
overall_differencecondition 0.090710 0.049734 1.82
sample_size:overall_differencecontrast -0.000466 0.000902 -0.52
sample_size:overall_differencecondition -0.000986 0.000750 -1.31
Correlation of Fixed Effects:
(Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size -0.792
ovrll_dffrnccnt -0.665 0.527
ovrll_dffrnccnd -0.799 0.633 0.532
smpl_sz:vrll_dffrnccnt 0.527 -0.665 -0.792 -0.421
smpl_sz:vrll_dffrnccnd 0.633 -0.799 -0.421 -0.792
smpl_sz:vrll_dffrnccnt
sample_size
ovrll_dffrnccnt
ovrll_dffrnccnd
smpl_sz:vrll_dffrnccnt
smpl_sz:vrll_dffrnccnd 0.532
Does between subjects variance change with sample size?
Yes. Between subjects variance decreases with sample size. This is more pronounced for non-contrast measures.
This goes against my intuitions. Looking at the change in between subjects percentage of individual measures’ there seems to be a lot of inter-measure variance (more pronounced below for within subject variance). I’m not sure if there is something in common for the measures that show increasing between subjects variability with sample size and that separates them from those that show decreasing between subjects variability with sample size (the slight majority).
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_subs_pct = mean(mean_var_subs_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_subs_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of \n between subjects variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_subs_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_subs_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 3274121
Scaled residuals:
Min 1Q Median 3Q Max
-4.931 -0.705 0.054 0.704 4.571
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 109.05 10.44
iteration (Intercept) 1.26 1.12
Residual 212.49 14.58
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 57.911 0.898
factor(sample_size)15 -0.139 0.175
factor(sample_size)20 -0.467 0.175
factor(sample_size)25 -0.964 0.175
factor(sample_size)50 -3.428 0.174
factor(sample_size)75 -5.652 0.174
factor(sample_size)100 -7.458 0.174
factor(sample_size)125 -9.173 0.174
overall_differencecontrast -14.208 1.340
overall_differencecondition -5.099 1.115
factor(sample_size)15:overall_differencecontrast 0.298 0.262
factor(sample_size)20:overall_differencecontrast 0.611 0.262
factor(sample_size)25:overall_differencecontrast 1.143 0.262
factor(sample_size)50:overall_differencecontrast 3.440 0.262
factor(sample_size)75:overall_differencecontrast 5.385 0.262
factor(sample_size)100:overall_differencecontrast 6.711 0.262
factor(sample_size)125:overall_differencecontrast 8.236 0.262
factor(sample_size)15:overall_differencecondition 0.219 0.218
factor(sample_size)20:overall_differencecondition 0.607 0.218
factor(sample_size)25:overall_differencecondition 0.897 0.218
factor(sample_size)50:overall_differencecondition 1.993 0.218
factor(sample_size)75:overall_differencecondition 2.978 0.218
factor(sample_size)100:overall_differencecondition 3.532 0.218
factor(sample_size)125:overall_differencecondition 4.155 0.218
t value
(Intercept) 64.5
factor(sample_size)15 -0.8
factor(sample_size)20 -2.7
factor(sample_size)25 -5.5
factor(sample_size)50 -19.6
factor(sample_size)75 -32.4
factor(sample_size)100 -42.7
factor(sample_size)125 -52.6
overall_differencecontrast -10.6
overall_differencecondition -4.6
factor(sample_size)15:overall_differencecontrast 1.1
factor(sample_size)20:overall_differencecontrast 2.3
factor(sample_size)25:overall_differencecontrast 4.4
factor(sample_size)50:overall_differencecontrast 13.1
factor(sample_size)75:overall_differencecontrast 20.5
factor(sample_size)100:overall_differencecontrast 25.6
factor(sample_size)125:overall_differencecontrast 31.4
factor(sample_size)15:overall_differencecondition 1.0
factor(sample_size)20:overall_differencecondition 2.8
factor(sample_size)25:overall_differencecondition 4.1
factor(sample_size)50:overall_differencecondition 9.1
factor(sample_size)75:overall_differencecondition 13.7
factor(sample_size)100:overall_differencecondition 16.2
factor(sample_size)125:overall_differencecondition 19.0
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Does within subjects variance change with sample size?
Yes. Within subject variance increses with sample size. This again goes against my intuition but here the inter-meausre differences are even more pronounced. There appears to be some measures for which the change in two measurements at different time points is larger the more subjects are tested and those that show a smaller decrease in within subject variance with larger sample sizes. I still don’t know if these two types of measures have anything that distinguishes them.
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_ind_pct = mean(mean_var_ind_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_ind_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of \n within subjects variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_ind_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_ind_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 3471555
Scaled residuals:
Min 1Q Median 3Q Max
-4.551 -0.750 -0.171 0.698 4.326
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 132.28 11.50
iteration (Intercept) 1.41 1.19
Residual 348.68 18.67
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 19.234 0.992
factor(sample_size)15 0.504 0.224
factor(sample_size)20 1.149 0.224
factor(sample_size)25 1.844 0.224
factor(sample_size)50 5.282 0.224
factor(sample_size)75 8.154 0.224
factor(sample_size)100 10.486 0.224
factor(sample_size)125 12.721 0.224
overall_differencecontrast 4.621 1.481
overall_differencecondition 1.928 1.232
factor(sample_size)15:overall_differencecontrast -0.606 0.336
factor(sample_size)20:overall_differencecontrast -1.230 0.336
factor(sample_size)25:overall_differencecontrast -1.885 0.336
factor(sample_size)50:overall_differencecontrast -4.844 0.336
factor(sample_size)75:overall_differencecontrast -7.225 0.336
factor(sample_size)100:overall_differencecontrast -8.779 0.336
factor(sample_size)125:overall_differencecontrast -10.817 0.336
factor(sample_size)15:overall_differencecondition -0.297 0.280
factor(sample_size)20:overall_differencecondition -0.634 0.280
factor(sample_size)25:overall_differencecondition -0.928 0.279
factor(sample_size)50:overall_differencecondition -2.201 0.279
factor(sample_size)75:overall_differencecondition -3.192 0.279
factor(sample_size)100:overall_differencecondition -3.630 0.279
factor(sample_size)125:overall_differencecondition -4.312 0.279
t value
(Intercept) 19.4
factor(sample_size)15 2.3
factor(sample_size)20 5.1
factor(sample_size)25 8.2
factor(sample_size)50 23.6
factor(sample_size)75 36.5
factor(sample_size)100 46.9
factor(sample_size)125 56.9
overall_differencecontrast 3.1
overall_differencecondition 1.6
factor(sample_size)15:overall_differencecontrast -1.8
factor(sample_size)20:overall_differencecontrast -3.7
factor(sample_size)25:overall_differencecontrast -5.6
factor(sample_size)50:overall_differencecontrast -14.4
factor(sample_size)75:overall_differencecontrast -21.5
factor(sample_size)100:overall_differencecontrast -26.1
factor(sample_size)125:overall_differencecontrast -32.2
factor(sample_size)15:overall_differencecondition -1.1
factor(sample_size)20:overall_differencecondition -2.3
factor(sample_size)25:overall_differencecondition -3.3
factor(sample_size)50:overall_differencecondition -7.9
factor(sample_size)75:overall_differencecondition -11.4
factor(sample_size)100:overall_differencecondition -13.0
factor(sample_size)125:overall_differencecondition -15.4
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Does residual variance change with sample size?
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_resid_pct = mean(mean_var_resid_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_resid_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of residual variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_resid_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_resid_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2942477
Scaled residuals:
Min 1Q Median 3Q Max
-4.591 -0.624 -0.037 0.560 8.251
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 39.93 6.319
iteration (Intercept) 0.44 0.663
Residual 92.57 9.621
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 22.8548 0.5443
factor(sample_size)15 -0.3645 0.1153
factor(sample_size)20 -0.6816 0.1152
factor(sample_size)25 -0.8793 0.1152
factor(sample_size)50 -1.8549 0.1152
factor(sample_size)75 -2.5014 0.1152
factor(sample_size)100 -3.0273 0.1152
factor(sample_size)125 -3.5482 0.1152
overall_differencecontrast 9.5866 0.8124
overall_differencecondition 3.1707 0.6758
factor(sample_size)15:overall_differencecontrast 0.3080 0.1731
factor(sample_size)20:overall_differencecontrast 0.6197 0.1731
factor(sample_size)25:overall_differencecontrast 0.7418 0.1731
factor(sample_size)50:overall_differencecontrast 1.4046 0.1730
factor(sample_size)75:overall_differencecontrast 1.8407 0.1730
factor(sample_size)100:overall_differencecontrast 2.0673 0.1730
factor(sample_size)125:overall_differencecontrast 2.5810 0.1730
factor(sample_size)15:overall_differencecondition 0.0784 0.1440
factor(sample_size)20:overall_differencecondition 0.0268 0.1440
factor(sample_size)25:overall_differencecondition 0.0313 0.1440
factor(sample_size)50:overall_differencecondition 0.2082 0.1440
factor(sample_size)75:overall_differencecondition 0.2141 0.1440
factor(sample_size)100:overall_differencecondition 0.0988 0.1440
factor(sample_size)125:overall_differencecondition 0.1573 0.1440
t value
(Intercept) 42.0
factor(sample_size)15 -3.2
factor(sample_size)20 -5.9
factor(sample_size)25 -7.6
factor(sample_size)50 -16.1
factor(sample_size)75 -21.7
factor(sample_size)100 -26.3
factor(sample_size)125 -30.8
overall_differencecontrast 11.8
overall_differencecondition 4.7
factor(sample_size)15:overall_differencecontrast 1.8
factor(sample_size)20:overall_differencecontrast 3.6
factor(sample_size)25:overall_differencecontrast 4.3
factor(sample_size)50:overall_differencecontrast 8.1
factor(sample_size)75:overall_differencecontrast 10.6
factor(sample_size)100:overall_differencecontrast 11.9
factor(sample_size)125:overall_differencecontrast 14.9
factor(sample_size)15:overall_differencecondition 0.5
factor(sample_size)20:overall_differencecondition 0.2
factor(sample_size)25:overall_differencecondition 0.2
factor(sample_size)50:overall_differencecondition 1.4
factor(sample_size)75:overall_differencecondition 1.5
factor(sample_size)100:overall_differencecondition 0.7
factor(sample_size)125:overall_differencecondition 1.1
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Conclusion: Larger samples are better for reliability but not necessarily always for the same reasons; for some variables this is due to increasing between subjects variance while for others it’s due to decreasing residual variance (?).
rm(rel_df_sample_size, rel_df_sample_size_summary)
The H in HDDM estimates stands for ‘hierarchical’ denoting the fact that the distribution of the whole sample is incorportated in the priors for the model parameters. This is different than e.g. EZ-diffusion parameters that would be the same for a given subject’s data regardless of the rest of the sample. One can ask whether this H indeed has a measurable impact.
HDDM and EZ parameter estimates might differ due to other difference in the estimation process as well. Therefore to evaluate whether the ‘H’ leads to meaningful changes in parameters we compare the same models fit on the same data using either the whole sample for the hierarchical structure or only the single subject’s data. These latter estimates are referred to as ‘flat’ estimates.
retest_hddm_flat = read.csv(paste0(retest_data_path,'retest_hddm_flat.csv'))
retest_hddm_flat = retest_hddm_flat %>% rename(sub_id = subj_id)
test_hddm_flat = read.csv(paste0(retest_data_path,'/t1_data/t1_hddm_flat.csv'))
test_hddm_flat = test_hddm_flat %>% rename(sub_id = subj_id)
# numeric_cols = get_numeric_cols()
# Check if all the variables are there (no for now)
# sum(names(retest_hddm_flat) %in% numeric_cols) == length(names(retest_hddm_flat))
# names(retest_hddm_flat)[which(names(retest_hddm_flat) %in% numeric_cols == FALSE)]
# sum(names(test_hddm_flat) %in% numeric_cols) == length(names(test_hddm_flat))
# names(test_hddm_flat)[which(names(test_hddm_flat) %in% numeric_cols == FALSE)]
Plot percent change in raw parameters for retest and t1 for each of 3 parameters (using hierarchical as baseline: what percent does the flat parameter change compared to the hierarchical)
For both time points the thresholds and non-decision times are very similary regardless of whether they are estimated heirarchically or without the hierarchy (difference peaking at 0).
This is also true for the majority of the drift rates. There are, however, almost as many drift rates that change completely when estimated without the hierarchy.
#Should not be necessary later
common_cols = names(retest_hddm_flat)[names(retest_hddm_flat) %in% names(retest_data)]
common_cols=common_cols[common_cols %in% names(test_hddm_flat)]
retest_hddm_hier = retest_data %>% select(common_cols) %>% mutate(hddm="hierarchical")
test_hddm_hier = test_data %>% select(common_cols) %>% mutate(hddm="hierarchical")
retest_hddm_flat = retest_hddm_flat %>% select(common_cols) %>% mutate(hddm="flat")
test_hddm_flat = test_hddm_flat %>% select(common_cols) %>% mutate(hddm="flat")
retest_flat_difference = rbind(retest_hddm_hier, retest_hddm_flat)
retest_flat_difference = retest_flat_difference %>%
gather(dv, value, -sub_id, -hddm) %>%
spread(hddm, value) %>%
mutate(diff_pct = (hierarchical - flat)/hierarchical*100,
diff_pct = ifelse(diff_pct<(-100), -100, ifelse(diff_pct>100, 100, diff_pct)),
time = "retest",
par = ifelse(grepl("drift", dv), "drift", ifelse(grepl("thresh", dv), "thresh", ifelse(grepl("non_decision", dv), "non_decision", NA))))
test_flat_difference = rbind(test_hddm_hier, test_hddm_flat)
test_flat_difference = test_flat_difference %>%
gather(dv, value, -sub_id, -hddm) %>%
spread(hddm, value) %>%
mutate(diff_pct = (hierarchical - flat)/hierarchical*100,
diff_pct = ifelse(diff_pct<(-100), -100, ifelse(diff_pct>100, 100, diff_pct)),
time = "test",
par = ifelse(grepl("drift", dv), "drift", ifelse(grepl("thresh", dv), "thresh", ifelse(grepl("non_decision", dv), "non_decision", NA))))
flat_difference = rbind(test_flat_difference, retest_flat_difference)
flat_difference %>%
ggplot(aes(diff_pct))+
geom_histogram()+
facet_grid(factor(time, levels = c("test", "retest"),labels = c("test", "retest")) ~ par, scales="free")+
xlab("Percent change")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1022 rows containing non-finite values (stat_bin).
Is the average percentage change different than 0? No.
This model uses the distribution of percentage difference in parameter estimates for drift rates as the baseline. A more appropriate model would test whether all three distributions are different than 0. The conclusion should not change: The intercept suggests that the mean of the drift rate difference distribution is at 0 and this is not different than the distributions for either of the other parameters and either time point.
summary(lmer(diff_pct ~ time*par+(1|dv), flat_difference %>% mutate(pars = factor(par, levels = c("thresh", "drift", "non_decision")))))
Linear mixed model fit by REML ['lmerMod']
Formula: diff_pct ~ time * par + (1 | dv)
Data:
flat_difference %>% mutate(pars = factor(par, levels = c("thresh",
"drift", "non_decision")))
REML criterion at convergence: 238677
Scaled residuals:
Min 1Q Median 3Q Max
-3.244 -0.258 -0.016 0.213 3.853
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 130 11.4
Residual 1443 38.0
Number of obs: 23578, groups: dv, 82
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.19 1.64 -0.73
timetest 1.64 0.62 2.65
parnon_decision 1.65 3.57 0.46
parthresh -1.84 3.39 -0.54
timetest:parnon_decision -1.38 1.35 -1.02
timetest:parthresh -1.71 1.28 -1.33
Correlation of Fixed Effects:
(Intr) timtst prnn_d prthrs tmts:_
timetest -0.191
parnon_dcsn -0.460 0.088
parthresh -0.485 0.093 0.223
tmtst:prnn_ 0.088 -0.459 -0.191 -0.043
tmtst:prthr 0.092 -0.483 -0.042 -0.192 0.222
Do people change in the same way? Mostly.
Plotting the raw parameter estimate that is estimated hierarchically against the estimate that is estimated without the hierarchy. Red lines are the 45-degree line. The fact that many points cluster around this line and that there are no systematic deviances from it for any parameter at either time point suggests that the hierarchical and flat estimates are mostly the same.
fig_name = 'HDDM_par_flatvshier.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
Plotting the reliability estimates for flat parameters against the reliability of hierarchical estimates. Red lines are 45-degree lines. While there are some changes in reliability nothing appears very large (or consequential in pushing the reliability to any acceptable level depending on the estimation method) or systematic.
rel_df_flat = make_rel_df(t1_df = test_hddm_flat, t2_df = retest_hddm_flat, metrics = c('icc', 'pearson', 'var_breakdown'))
rel_df_flat = rel_df_flat %>%
left_join(rel_df[,c("dv", "icc", "rt_acc", "overall_difference")], by = "dv")
fig_name = 'HDDM_rel_flatvshier.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
with(rel_df_flat %>% filter(rt_acc == "drift rate"), t.test(icc.x, icc.y, paired=T))
Paired t-test
data: icc.x and icc.y
t = 0.53, df = 51, p-value = 0.6
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02018 0.03454
sample estimates:
mean of the differences
0.007181
with(rel_df_flat %>% filter(rt_acc == "threshold"), t.test(icc.x, icc.y, paired=T))
Paired t-test
data: icc.x and icc.y
t = -2, df = 15, p-value = 0.06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.088179 0.002222
sample estimates:
mean of the differences
-0.04298
with(rel_df_flat %>% filter(rt_acc == "non-decision"), t.test(icc.x, icc.y, paired=T))
Paired t-test
data: icc.x and icc.y
t = -0.19, df = 13, p-value = 0.9
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.04691 0.03948
sample estimates:
mean of the differences
-0.003715
rm(flat_difference, rel_df_flat, retest_flat_difference, retest_hddm_flat, retest_hddm_hier, test_flat_difference, test_hddm_flat, test_hddm_hier)
Do the fit statistics differ by whether the model was hierarchical or not? No.
t1_hierarchical_fitstats = t1_hierarchical_fitstats %>%
filter(subj_id %in% retest_hierarchical_fitstats$subj_id)
tmp = rbind(retest_hierarchical_fitstats, retest_flat_fitstats, t1_hierarchical_fitstats, t1_flat_fitstats) %>%
select(m_kl, subj_id, task_name, sample) %>%
separate(sample, c("time", "proc"), sep="_", remove=FALSE)
Warning in `[<-.factor`(`*tmp*`, ri, value = c(5L, 7L, 9L, 11L, 12L, 14L, :
invalid factor level, NA generated
Warning in `[<-.factor`(`*tmp*`, ri, value = c(5L, 7L, 9L, 11L, 12L, 14L, :
invalid factor level, NA generated
fig_name = 'HDDM_fitstats_flatvshier.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
tmp
summary(lmer(m_kl ~ time*proc+(1|task_name),tmp))
Linear mixed model fit by REML ['lmerMod']
Formula: m_kl ~ time * proc + (1 | task_name)
Data: tmp
REML criterion at convergence: 30849
Scaled residuals:
Min 1Q Median 3Q Max
-2.058 -0.429 -0.199 0.163 13.706
Random effects:
Groups Name Variance Std.Dev.
task_name (Intercept) 1.11 1.05
Residual 2.55 1.60
Number of obs: 8147, groups: task_name, 14
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.4481 0.2834 5.11
timet1 0.0487 0.0494 0.99
prochierarchical -0.1476 0.0498 -2.96
timet1:prochierarchical -0.0115 0.0710 -0.16
Correlation of Fixed Effects:
(Intr) timet1 prchrr
timet1 -0.087
prochrrchcl -0.086 0.495
tmt1:prchrr 0.061 -0.695 -0.699
– Do DDM parameters capture similar processes as the raw measures in a given task or do they capture processes that are more similar across tasks? (If the former they would be less useful than if the latter.)
This could be analyzed with factor analysis but there are more variables than observations so as a first pass we’ll explore correlations.
if(!exists('all_data_cor')){
source('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/code/workspace_scripts/var_cor_data.R')
}
fig_name = 'ddm_raw_vars_cor.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
(Absolute) correlations between raw and ddm measures within a task are higher than those between ddm measures across tasks.
Are the correlations between ddm parameters across tasks higher than correlations between ddm parameters and other variables from a given task? No. Correlations between variables of the same task are consistently higher.
In some sense that is good. We haven’t created all of these different tasks that putatively measure different things for nothing.
summary(lm(log(abs(value)) ~ ddm_ddm*task_task, all_data_cor))
Call:
lm(formula = log(abs(value)) ~ ddm_ddm * task_task, data = all_data_cor)
Residuals:
Min 1Q Median 3Q Max
-12.964 -0.559 0.215 0.812 2.407
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.62444 0.00388 -676.98 < 2e-16 ***
ddm_ddmddm-raw 0.05489 0.00519 10.58 < 2e-16 ***
ddm_ddmraw-raw 0.08502 0.00732 11.62 < 2e-16 ***
task_tasksame task 0.68130 0.01291 52.79 < 2e-16 ***
ddm_ddmddm-raw:task_tasksame task -0.05512 0.01770 -3.11 0.0018 **
ddm_ddmraw-raw:task_tasksame task -0.15754 0.02530 -6.23 4.8e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.16 on 260887 degrees of freedom
Multiple R-squared: 0.0231, Adjusted R-squared: 0.023
F-statistic: 1.23e+03 on 5 and 260887 DF, p-value: <2e-16
Can we recover a 3 factor structure for -EZ and -HDDM variables that captures comparable processes across tasks?
Here is the notebook on my current confusion.
– Are these clusters more reliable than using either the raw or the DDM measures alone?
– Do raw or DDM measures (or factor scores) predict real world outcomes better?